首页> 外文OA文献 >Improving K-NN Internet Traffic Classification Using Clustering and Principle Component Analysis
【2h】

Improving K-NN Internet Traffic Classification Using Clustering and Principle Component Analysis

机译:使用聚类和主成分分析改进KNN互联网流量分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

K-Nearest Neighbour (K-NN) is one of the popular classification algorithm, in this research K-NN use to classify internet traffic, the K-NN is appropriate for huge amounts of data and have more accurate classification, K-NN algorithm has a disadvantages in computation process because K-NN algorithm calculate the distance of all existing data in dataset. Clustering is one of the solution to conquer the K-NN weaknesses, clustering process should be done before the K-NN classification process, the clustering process does not need high computing time to conqest the data which have same characteristic, Fuzzy C-Mean is the clustering algorithm used in this research. The Fuzzy C-Mean algorithm no need to determine the first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered. The Fuzzy C-Mean has weakness in clustering results obtained are frequently not same even though the input of dataset was same because the initial dataset that of the Fuzzy C-Mean is less optimal, to optimize the initial datasets needs feature selection algorithm. Feature selection is a method to produce an optimum initial dataset Fuzzy C-Means. Feature selection algorithm in this research is Principal Component Analysis (PCA). PCA can reduce non significant attribute or feature to create optimal dataset and can improve performance for clustering and classification algorithm. The resultsof this research is the combination method of classification, clustering and feature selection of internet traffic dataset was successfully modeled internet traffic classification method that higher accuracy and faster performance.
机译:K-最近邻居(K-NN)是一种流行的分类算法,在本研究中,K-NN用于对互联网流量进行分类,K-NN适用于海量数据并具有更准确的分类,K-NN算法由于K-NN算法会计算数据集中所有现有数据的距离,因此在计算过程中有一个缺点。聚类是克服K-NN弱点的解决方案之一,聚类过程应在K-NN分类过程之前完成,聚类过程不需要花费大量的计算时间即可确定具有相同特征的数据,Fuzzy C-Mean为本研究中使用的聚类算法。 Fuzzy C-Mean算法无需确定要形成的聚类的第一个数目,在该算法上形成的聚类将自然输入基于数据集的形式。 Fuzzy C-Mean在聚类结果方面存在弱点,即使数据集的输入相同,由于原始数据集的模糊性也不佳,因此获得的聚类结果经常也不相同,要优化初始数据集就需要特征选择算法。特征选择是一种生成最佳初始数据集模糊C均值的方法。本研究中的特征选择算法是主成分分析(PCA)。 PCA可以减少非重要属性或特征以创建最佳数据集,并可以提高聚类和分类算法的性能。本研究的结果是将互联网流量数据集的分类,聚类和特征选择相结合的方法,成功地建模了互联网流量分类方法,具有较高的准确性和更快的性能。

著录项

  • 作者

    Paramita, Adi Suryaputra;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 EN
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号